EN6106 - Comprehensive Study Notes

1. Microservices Architecture

1.1 Definition and Core Principles

Definition: Microservices architecture decomposes a monolithic application into independent, loosely coupled services, each with its own process, deployment, and scaling capabilities.

Core Principles: Single Responsibility, Decentralized Governance, Technology Diversity, Resilience, Scalability.

1.2 Monolithic vs. Microservices

Aspect	Monolithic	Microservices
Deployment	Single unit	Independent services
Scalability	Difficult	Easy
Fault Isolation	Poor	High
Development	Slower	Faster

1.3 Design Principles

Single Responsibility Principle: Each service should have a single responsibility.
Right-sized Services: Avoid over-fragmentation; aim for services that are neither too large nor too small.
Decentralized Governance: Teams have autonomy over their services.

1.4 Communication Protocols

Synchronous: REST, Thrift.
Asynchronous: AMQP, STOMP, MQTT.

1.5 Data Management

Each microservice owns its database; data is accessed only via service APIs.

1.6 Security

OAuth2 and OpenID Connect for authentication and authorization.

1.7 Deployment

Docker: Containerization. Kubernetes: Orchestration.

Example: Netflix’s migration to microservices for scalability and reliability.

1.8 Challenges

Distributed system complexity.
Data consistency across services.
Service discovery and inter-service communication.

1.9 References

Ref 1: Spring Microservices

2. Monolithic Architecture

2.1 Key Features

Single codebase.
Tight coupling between components.
Centralized logging and monitoring.
Simplicity of having one codebase.
Speedy development and deployment.

2.2 When to Use

Small, simple applications.
Rapid development and deployment.
Small development team.

2.3 Limitations

Difficult to scale.
Hard to upgrade and add new features.
Practices agile development and delivery methodologies less effectively.

3. Data Science with Python

3.1 Key Libraries

Pandas: Data manipulation and analysis.
NumPy: Numerical computing.
Matplotlib: Data visualization.
Seaborn: Statistical data visualization.
SciPy: Scientific computing.
Scikit-learn: Machine learning.

3.2 Data Science Process

Data Collection: Gather data from various sources (databases, APIs, files).
Data Cleaning: Handle missing values, remove duplicates, correct errors.
Data Exploration: Understand data through summary statistics and visualizations.
Data Modeling: Build predictive models using machine learning algorithms.
Data Interpretation: Analyze model outputs and insights.
Data Visualization: Present insights using charts, graphs, and dashboards.
Communication: Share findings with stakeholders.

3.3 Data Pipelines

ETL Process: Extract, Transform, Load.

Tools: Apache Hadoop, Spark, Informatica PowerCenter, Apache Kafka.

3.4 Requirements of Data Pipelines

Extract data from multiple relevant data sources.
Clean, alter, and enrich data so it can be ready for analysis.
Load the data to a single source of information, usually a data lake or a data warehouse.

Example: Using Pandas for data cleaning and Matplotlib for visualization.

3.5 References

Ref 2: W3Schools Data Science Tutorial

4. Artificial Intelligence (AI)

4.1 Types of AI

Weak AI/Narrow AI: Task-specific (e.g., chatbots, recommendation systems).
Strong AI/General AI: Human-like reasoning (hypothetical).
Artificial Superintelligence (ASI): Exceeds human intelligence (hypothetical).

4.2 AI Paradigms

Turing Paradigm: Machines must convince humans they are not machines.
Connectionist Paradigm: Mimics human brain structure (neural networks).
Evolutionary Paradigm: Uses genetic algorithms.
Bayesian Paradigm: Probabilistic reasoning.
Fuzzy Logic: Handles uncertainty.

4.3 Subfields

Machine Learning: Algorithms that improve with data.
Deep Learning: Neural networks with multiple layers.
Natural Language Processing (NLP): Language understanding and generation.
Computer Vision: Image and video analysis.

4.4 Applications

Software-based:

NLP (Google Translate, ChatGPT).
Computer Vision (Google Photos, FaceID).
Speech Recognition (Siri, Alexa).

Hardware-based:

Robots, Autonomous Vehicles, Drones.
AI Chips (Google TPUs, NVIDIA GPUs).
Smart Home Devices (Nest, Alexa).

Example: Tesla Autopilot for autonomous vehicles.

4.5 References

Ref 3: NetLogo Multi-Agent Modeling

5. Social Network Analysis (SNA)

5.1 Graph Theory Basics

Undirected Graphs: No directionality.
Directed Graphs: Directional relationships.
Centrality Measures:
- Degree Centrality: Number of connections.
- Betweenness Centrality: Bridge between groups.
- Eigenvector Centrality: Influence of connected nodes.
- Closeness Centrality: Distance to all other nodes.

5.2 Data Management

Tools: Gephi, UCINET, Pajek.

Best practices: Standardization, cleaning, documentation.

5.3 Applications

Identifying key influencers.
Detecting communities/clusters.
Analyzing information flow.
Cybersecurity (gang activity, fraud detection).

5.4 References

Ref 4: NetworkX Reference

6. Digital Forensics

6.1 Key Concepts

Branches:

Computer Forensics
Mobile Forensics
Network Forensics
Database Forensics

Digital Evidence Types:

Emails, chat logs, internet browser histories, metadata, deleted files, contents of computer memory.

6.2 Forensic Process

Identification: Gather information and identify potential evidence sources.
Collection: Secure and preserve evidence (forensic imaging).
Examination: Analyze data for relevance.
Analysis: Interpret evidence and draw conclusions.
Presentation: Report findings for legal proceedings.

6.3 Chain of Custody

Purpose: Document handling and movement of evidence.

Key Points:

Document date, time, description of evidence, and handler’s name.
Prevents tampering, contamination, or loss of evidence.
Starts at collection and ends at court presentation.

6.4 Roles in Law Enforcement

Cybercrime investigations.
Fraud investigations.
Child exploitation cases.
Terrorism investigations.

Example: Using Autopsy for forensic analysis.

6.5 References

Ref 5: Python Digital Forensics Tutorial

7. Data Visualization

7.1 Benefits

Easier identification of trends and patterns.
Improved decision-making.
Easy way to share information for non-technical audiences.
Visualize patterns and relationships.

7.2 Tools

Matplotlib, Seaborn, Plotly, Tableau.

Example: Using Seaborn for statistical data visualization.

8. Key Exam Insights

Commonly Tested Topics:

Topic	Key Points
Microservices	Decentralized data management, independent deployment, communication protocols.
Monolithic Architecture	Single codebase, simplicity, difficulty in scaling.
Data Science with Python	Pandas, NumPy, ETL process, data visualization.
AI	Weak vs. Strong AI, AI paradigms, subfields (ML, NLP, CV).
SNA	Graph theory, centrality measures, data management tools.
Digital Forensics	Evidence types, forensic process, chain of custody.
Data Visualization	Benefits, tools, and applications.

Exam Tips

Focus on decentralized data management in microservices.
Understand the difference between monolithic and microservices architectures.
Know the steps of the data science process and key Python libraries.
Be familiar with AI paradigms and their applications.
Practice graph theory and centrality measures for SNA.
Review the forensic process and chain of custody for digital forensics.

EN6106 – Emerging Topics in Information Technology